Establishment of a Latin American dataset to enable the construction of gestational weight gain charts for adolescents

Gestational weight gain is an important indicator for monitoring nutritional status during pregnancy. However, there are no gestational weight gain references created for adolescents or national datasets to enable the construction of such graphs up to date. This manuscript aims to describe the creation of a Latin American dataset to construct gestational weight gain references for adolescents aged 10–19 years old. Gestational weight gain data from studies conducted in nine countries (Argentina, Brazil, Chile, Colombia, Mexico, Panama, Paraguay, Peru, and Uruguay) collected between 2003 and 2021 were harmonized. Data on height, weight, and gestational age in at least two gestational trimesters were included. Pregnant adolescents should be free of diseases that could affect weight, and newborns should weigh between 2,500–4,000 g and be free of congenital malformations. The final dataset included 6,414 individuals after data cleaning. Heterogeneity between the countries was assessed by calculating standardized site differences for GWG and z scores of height-for-age. Several imputation procedures were tested, and approximately 10% of the first-trimester weights were imputed. The prevalence of individuals with underweight (1.5%) and obesity (5.3%) was low, which may lead to problems when modeling the curves for such BMI categories. Maternal height and gestational weight gain did not show significant differences by country, according to the standardized site differences. A harmonized dataset of nine countries with imputed data in the first trimester of pregnancy was prepared to construct Latin American gestational weight gain curves for adolescents.


Introduction
Anthropometry is a suitable technique to assess adolescent individuals' growth and nutritional status [1].To date, there are no guidelines for monitoring gestational weight gain (GWG) during the prenatal care of adolescent mothers [2,3].In Latin America, GWG references based on data from adult pregnant individuals from Chile, Argentina, Uruguay, or the United States have been used interchangeably on adolescents [4,5].However, such references designed for adults are not recommended for adolescent use due to differences in this age group's physiological condition and social determinants of health and nutrition [4,6,7].
Experts have previously recommended that anthropometric references designed to assess GWG should result from longitudinal studies of selected populations with a low incidence of maternal and fetal complications [8].Additionally, these references should consider the prepregnancy body mass index (BMI) status [8,9].In those studies, anthropometric measurements should be collected before and during pregnancy and childbirth [10].There is no comprehensive dataset to construct such charts for adolescents in Latin American countries.To the best of our knowledge, there are no initiatives in those countries to explore GWG data of adolescents and particular characteristics these groups may have, such as late prenatal care initiation and continuing height growth during pregnancy.
Thus, this study aims to describe the efforts for data acquisition, harmonization, and several methodological challenges faced when creating a dataset to enable the construction of GWG charts for adolescents living in Latin America.

Study design
This is a retrospective longitudinal study based on the secondary analysis of harmonized datasets from nine Latin-American countries (Argentina, Brazil, Chile, Colombia, Mexico, Panama, Paraguay, Peru, and Uruguay).Researchers and officials from health ministries or institutions were invited to participate in the initiative and provide datasets for analyses.Participants were selected based on their previous publications in the field and institutional affiliations.
A standardized form was used to gather the necessary data from each study and decide their potential inclusion in the combined dataset.Each invited researcher provided the following details about their studies: the origin of the study, city and country of data collection, source of data collection (primary or secondary), the sample size of the dataset, maternal age/ date of birth, the total number of prenatal care visits for each individual, presence of obstetric and previous diseases (such as diabetes mellitus, hypertension), weight and height measurements, gestational age at prenatal care visits, and birth weight.
The forms were evaluated by the core team of researchers of the project, and the datasets from studies that included the variables necessary for the construction of the charts were requested.Over 30 people from 13 countries were invited, and those from the nine countries listed above answered the form and were included in this project.Then, the data of 33,446 pregnant individuals with approximately 150,000 weight measurements were consolidated, and the eligibility criteria were applied.These 150,000 measurements referred to individuals who gave birth to a child alive without congenital malformations.

Eligibility criteria
To be included in the study, adolescents had to: be between 10 and 19 years old (verified by date of birth and conception); have at least one prenatal weight (kg) in the second and third trimesters, with their respective gestational age (weeks); give birth at term (37 to 42 weeks), and with birth weight between 2,500 and 4,000 g.Individuals also needed to be free of hypertension, preeclampsia, diabetes mellitus or gestational diabetes, tuberculosis, or cardiovascular diseases.Adolescents with height-for-age <-2 z score of the WHO charts were classified as having stunting [1] and were removed from the analyses.It is worth saying that only pregnant individuals without missing data in the variables of interest (maternal age, pre-pregnancy weight, height, pre-pregnancy BMI, cumulative GWG in each visit, gestational age in each, height-for-age, birth weight and length) were included.
All selected datasets were individually and carefully revised during the data-cleaning process.The pregnant adolescents with GWG � 30 kg and those with an impossible biological weight gain trajectory were excluded.This was manually identified by checking the sequence of the adolescent's weights during pregnancy.The weight gain of each adolescent was graphed, the weights with errors were identified, and weight gain or losses that did not follow the pattern observed for that particular pregnant individual were removed (S1 Fig) .Finally, we removed individuals who did not have pre-pregnancy weight or BMI data.At the end of this process, data from 6,414 adolescent pregnant individuals and 34,943 weight records were available to construct the GWG charts.

Main variables
Maternal age (years) was estimated as the difference between the date of birth of the pregnant individual and the date of conception.The date of the conception was determined based on an algorithm from the 'ob Wheel' available at https://obwheel.quartertone.net/.
Pre-pregnancy weight was obtained from three different sources, i.e., measured, abstracted from medical records, or self-reported.Height was measured at enrollment in the study or the start of prenatal care.These two variables were used to calculate pre-pregnancy BMI (kg/m 2 ).Pre-pregnancy BMI was classified according to the WHO charts z scores as underweight, <-2 Standard Deviation (SD); normal weight, � -2 SD to � +1 SD; overweight, > +1 SD to � +2SD; and obesity, > +2 SD [1].Cumulative GWG (kg) in each visit was calculated as the difference between the weight in each visit and the pre-pregnancy weight.
Gestational age in each visit was already available in the datasets and was not recalculated.Height-for-age z scores were calculated using the WHO AnthroPlus software.Birth weight (g) and length (cm) were also available in each dataset, and it was not possible to know the origin of those measurements (i.e., if they were measured in the study, abstracted from medical records, or reported by the mother).

Statistical analysis
Heterogeneity assessment.The heterogeneity of GWG and height-for-age z scores according to the pre-pregnancy BMI category across the nine countries was assessed by calculating the Standardized Site Differences (SSD).This method consists of the calculation of z scores for the means of GWG in gestational age groups (0-7, 8-14, 15-21; 22-28; 29-35; 36-42 weeks) and z scores of height-for-age in relation to the pooled mean and SD in each group/ age [11].According to Cohen [12], differences of 0.2 SD units are considered small, 0.5 SD units are acceptable, and 0.8 are large.Thus, SSD values within the range of ± 0.5 units were considered homogeneous, representing that the data could be combined in a unique dataset to construct the charts.
We also performed a sensitivity analysis for the heterogeneity assessment of GWG, excluding studies with less than ten records from the selected age groupings.This procedure was necessary to evaluate if smaller datasets could contribute highly to the observed heterogeneity due to the sample size and not to true differences between the GWG of the adolescents from those countries.
First-trimester missing data.The distribution of GWG was evaluated according to gestational age for each pre-pregnancy BMI category.Approximately 10% of GWG records did not present values between the fifth and thirteen weeks of pregnancy (first trimester).This proportion of missing data may lead to problems when modeling the GWG curves.For this reason, we decided to impute the weight in this period.We assumed that the data was missing not at random (MNAR).This decision considered that the lack of weight measurement in this period was related to a late beginning of prenatal care, which could not be explained by other variables available in the dataset.Several statistical techniques to deal with this issue were tested.
First, an initial data point for individuals with missing data was defined for a random week between 5 and 13.Subsequently, imputation methods were applied for univariate missing data (weight in the randomly selected week).These methods were chosen considering their adaptations to deal with MNAR.Predictive mean matching [13], random forest [14], classification and regression trees [15], random indicator [16], and imputation under the normal linear model with bootstrap [17] were considered.This later procedure was selected.The method calculates univariate imputations by drawing a bootstrap sample from the complete part of the data (individuals with first-trimester weight values) and subsequently taking the least-squares estimates from the linear model given the bootstrap sample, which incorporates the variability of the sampling into the parameters.It does not include the other weight measurements the individual may have, but it is possible to include other variables in the regression model [17].The adolescent's id, gestational age at each antenatal visit (weeks), and the number of antenatal visits during pregnancy were included as covariates in the normal linear model.We performed 50 iterations with five imputation datasets (m = 5), and Rubin's rules were used to combine the estimates [18].We constructed graphs using generalized additive models of location, shape and scale (GAMLSS) [19] for each imputation method to compare the distribution of the firsttrimester GWG according to gestational age.Two criteria were used to select the best imputation method.The first was the statistical criterion, in which the fit of the models was evaluated with the GAIC, AIC and Global Deviance [20] indicators, additionally the assumptions of normality and homoscedasticity were verified, and for the specification, the quantile comparison was implemented [21].The second criterion was a panel of experts in prenatal nutrition that assessed the biological plausibility of the data obtained in each estimated model, and analyzed each of the data of the pregnant woman in the three trimesters of pregnancy; in this case, special emphasis was placed on the behavior of the negative weight gains between trimesters and on the extent of this weight loss during the course of the pregnancy (between I and III trimester), and the smoothing of the curves in the first trimester and the amplitude of the percentile channels were also considered.
Descriptive analyses.Medians and interquartile ranges for continuous variables were calculated according to pre-pregnancy BMI category, country, antenatal visits, and trimesters.The analyses were performed in Python v3.8.2 and R v4.0.3.
Ethics.This study was approved by the Bioethics Committee of the Faculty of Dentistry of the Universidad de Antioquia, act number 12 of October 18, 2019.Only de-identified data was used in the analyses, which were performed by authorized investigators.The principal investigator of the University of Antioquia signed a confidentiality and custody agreement for the data with the investigator or institutional representative of each country.

Results
The data included in this analysis were collected between 2003 and 2021 and came from different sources: 13.7% from research projects, 68.3% from information systems, and 18.0% from maternal and perinatal care institutions' medical records (S1 Table in S1 File).
Height, pre-pregnancy weight, and third-trimester weight varied between the countries.Adolescents from Brazil, Chile, and Uruguay showed the highest median heights compared to those from Panama ´and Peru.The median pre-pregnancy and third-trimester weights were higher for Chilean individuals and smaller for Peruvians compared to the other countries.The median birth weight was lower among Mexican adolescents (Table 1).
No heterogeneity was observed for GWG according to gestational age and height-for-age z scores (Figs 2 and 3).Most of the SSD values for both indicators fell between -0.5/+0.5.For GWG, the SSDs were farther from the -0.5/+0.5 interval for individuals with underweight from Mexico at 36-42 weeks; for those with overweight from Peru at 22-28 weeks, and those with obesity from Mexico at 15-21 gestational weeks.When removing the countries with < 10 observations in each selected gestational age interval, it was possible to see that all the values fell in the expected range, suggesting that the low sample size could contribute to the observed heterogeneity (S2 Fig) .Based on that, we decided not to remove the observations of the countries with low sample sizes.
For the first trimester, individuals from Colombia classified with normal-weight, overweight, and obesity had the lower median (0 kg) GWG than the other countries.A negative median GWG was observed for individuals with underweight from Argentina (-2.0 kg).Several countries did not have any GWG data in this period for some BMI categories (S2 Table in S1 File).
Chile, Peru, and Mexico had a very low sample size available for several BMI categories for the second trimester.For individuals with underweight, the lowest median GWG was observed in Chile (2.8 kg).For normal weight, the median GWG for Panama ´(0.7 kg) was the lowest, and for overweight and obesity, the lowest median (0 kg) was observed for adolescents from Peru and Panama ´, respectively (S2 Table in S1 File).
In the third pregnancy trimester, the countries with the median GWG lowest for BMI category were: Panama in normal weight (8.4 kg), Panama and Paraguay in overweight (both 7kg), and Paraguay in obesity (6kg).Adolescents with underweight from Brazil and Paraguay presented the highest median GWG (17.2 and 17.0 kg, respectively) (S2 Table in S1 File).
The imputation process was made only for the underweight and overweight categories because, in the rest of the categories, the models presented adequate adjustments.The proportion of missing data in the first trimester was similar across the pre-pregnancy BMI categories and varied from 10.4% (underweight) to 12.8% (overweight).The imputation procedure completed the distribution of weight gain in the first trimester (between 5-13 weeks) with values within the expected range for the period (Fig 4).When the median of GWG in the first trimester was compared to the values after imputation, it was possible to observe that the values imputed were very similar to those originally observed in each country (S3 Table in S1 File).

Discussion
The final dataset compiled in this project from nine Latin American countries results from a rigorous data harmonization process.The homogeneity of the GWG and height-for-age z scores and similar maternal age distributions across the included countries reinforce the possibility of combining the data into a single dataset to construct GWG graphs for adolescents aged 10-19 years.
This study revealed that approximately 10.2% of the pre-pregnancy weight was missing.The high proportion of missing GWG data in the first trimester of pregnancy represented a challenge when constructing the GWG charts and needed to be adequately treated.We tested several methods adapted to deal with MNAR and decided to use imputation under the normal linear model with bootstrap to create first-trimester measurements used to calculate GWG in this period.The lack of data in the first trimester is recurrent.Yang et al. [22] also proposed an imputation approach to deal with missing first-trimester weight data but considered only imputing weight in a specific week (the 9 th gestational week).Imputing a specific week would not solve the need for data throughout the first trimester to construct the charts.Other authors designing GWG curves for adults also faced this methodological problem and decided not to impute a first-trimester weight measurement [23,24].
The absence of data for GWG in the first trimester is consistent with previous studies from Latin America that show worrying figures on the late attendance of adolescents to the first prenatal care visit [25][26][27], the low coverage of prenatal care programs in the region [28], and the possible difficulties for timely access to health services [29,30].Each government must establish goals for monitoring the quality and quantity of the data collected in each trimester of pregnancy, and special attention must be paid to adolescents.
Creating GWG curves requires enough sample size distributed according to all pre-pregnancy BMIs categories.Therefore, the low prevalence of adolescents classified as underweight (1.5%) and obesity (5.3%) according to pre-pregnancy BMI is worth mentioning.These prevalences are lower than those observed by Samano et al. [31] in Mexico (5.0 and 10.1%, respectively, using the WHO criteria).Pregnant adolescents with underweight and obesity, especially adolescents, are usually considered at a higher risk for adverse outcomes and are monitored in specialized prenatal care services.Therefore, they might not have been captured in the original studies that comprised the dataset used in the current investigation.Additionally, because the aim is to construct prescriptive GWG curves, individuals who gave birth to neonates with weight outside the 2,500-4,000g interval had to be removed.Adopting these criteria may have reduced the available sample size in these two BMI categories, but including them would result in an inappropriate dataset for constructing such curves.
The prevalence of adolescents in each BMI category in the harmonized dataset is similar to values reported for adults in Latin America, especially regarding normal and overweight.In Colombia, Benjumea and Bermu ´dez [5] reported a prevalence of 45.8% for normal and 24.7% for overweight.In Mexico, when comparing the pre-pregnancy BMI of adolescent mothers 15 (14)(15)(16) using three different classifications, it was observed that 73.4% of the individuals were classified with normal weight, which is very close to the prevalence observed in the current study (71.8%) [31].
The values for GWG in the first (before imputation) and second trimesters in most countries included in this study are lower than those observed for adults in other Latin American countries.Among adult pregnant individuals registered in a Brazilian surveillance system, Carrilho et al. [32] observed that, in 2018, the mean GWG varied between 0.8-1.7 kg in the first trimester and 3.8-6.0kg in the second trimester.Several countries had a median GWG = 0 kg in the current study until the second pregnancy trimester.For the third-trimester GWG, the values observed in the harmonized dataset are similar to those reported for total GWG for adults by Wang et al. [33].The authors used data from the Demographic Health Surveys to estimate the total GWG for 2015 for several regions.The projected mean for Latin American and Caribbean countries was 11.8 kg (95% uncertainty range 6.2-17.4kg).This interval includes all the medians observed in the current study for GWG in the third trimester for all BMI categories.The low median GWG observed in the first and second trimesters and the values for the third trimester closer to the total recommended GWG for adults suggest that these adolescents have a different weight gain pattern during pregnancy, with higher rates in the third and not the second pregnancy trimester, as it is usually expected for adults pregnant [9].

Strengths and limitations
This is the first study in Latin America to combine datasets from several countries to create a robust pooled dataset to analyze GWG among adolescents and the first initiative to construct GWG charts for this group.Several GWG curves for adults have been created since 2013 [23,24,34,35], and individuals < 19 years old were excluded from all of them.The rigorous harmonization of the dataset, with conferences by the team of researchers, continuous discussions, and visual inspections of the data are strengths of this study.
However, some limitations need to be pointed out.One of the main challenges in constructing this dataset was the non-existence of databases in some Latin-American countries and the low quality of data from national information systems, highlighting the need to implement and standardize adequate perinatal systems.Thus, it is difficult to evaluate the generalizability of the final dataset because national representative studies with adolescents to compare their characteristics are scarce.However, it is important to mention that we did not aim to have a representative sample of adolescent individuals from each selected country or the Latin-American continent.
The lack of data in the first trimester, which can reflect a feature of this group, is a significant limitation, which we tried to overcome with multiple imputation techniques.These methods can be extended to similar situations in this field.
Due to the different data sources used to construct the harmonized dataset and the lack of information, it is not possible to know the source pre-pregnancy weight data used (measured, abstracted from medical records, or self-reported) and the period to which the pre-pregnancy weight refers to, i.e., immediately before pregnancy, six months before, a year, etc.Although self-reported pre-pregnancy weight has a high agreement with measured first-trimester weight in Brazil and could be used to calculate pre-pregnancy BMI and GWG [36], the quality of other types of pre-pregnancy weight is unknown.The source of weight and length at birth data used is unknown too.The lack of data on sociodemographic characteristics of adolescents is also a limitation when working with a dataset resulting from a combination of multiple sources.
Finally, the absence of data on the height at the end of pregnancy did not allow us to evaluate how growth continues (or does not) to occur and its impact on GWG.This is a fundamental point when working with pregnancy in adolescence, especially at younger ages.Future studies with this group could incorporate at least one height evaluation at the end of pregnancy to allow for that possibility.

Conclusion
The main outcome of this study is a harmonized and homogeneous dataset from nine countries, with imputed data in the first pregnancy trimester, prepared for constructing Latin American GWG charts for adolescents.This is the first Latin American initiative to consolidate a dataset of pregnant adolescents that overcame problems in data harmonization, such as identifying implausible values, the assessment of heterogeneity, and the high proportion of missing data in the first pregnancy trimester.In Latin America, the information on the nutritional status of pregnant adolescents is limited, and the quality of the data from national information systems is far from ideal; however, with this rigorous harmonization process, we were able to obtain a large international dataset that could be used to construct unprecedented GWG curves for this group.